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(54) Compression of documents with markup language that preserves syntactical structure 



(57) Communication channels between motxle tele- 
phones and networks like the Internet have very limited 
tjandwidths. The transmission of documents expressed 
in various fomr^ like markup languages is made more 
efficient by oonrpressing document elements Into codes 
such that syntactical characteristics of the elements can 
be determined readily from the encoded representa- 
tions. An indication of the presence of syntax informa- 
tion like markup language tag attributes and content is 
conveyed in a position relative to the code beginning 




that is predefined. Preferat}ly, the position is independ- 
ent of the type of elentent tiiat is represented. In this 
manner, compressed or encoded representations of 
document elements can be processed elf identiy with- 
out need for expansion or decoding. In addition, future 
extensions to the markup language can be processed 
efficientiy by existing encoders and decoders that are 
not cognizant of the new extensions. 
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Description 



[0001 ] The present invention pertains generally to the 
conpression of information for transmission to devices 
over very lew bandwidth communication channel. 
More particularly, the pres^ invention pertains to the 
compression of information having a syntactical struc- 
ture, such as a document description that conforms to a 
generalized markup language, for transntission to a 
wireless device such as a handheld mobile telephone. 
[0002] Networks like the Internet have been in exist- 
ence for y^rs; however, they have not been a popular 
medium of infomnation exchange until very recentiy. The 
recent explosive growth in usage of the Internet, for 
example, is due in large part to the development of 
devices and methods that simplify the actions a user 
must take to access and peruse multimedia information 
stored across a network of servers. References to 
resources, known as hyperlinks, allow disparate pieces 
of information to be organized in nonsequential ways 
and allow a i^er to easily navigate among the linked 
information. By assigning a unique identifier, known as 
a Uniform Resource Locator (URL), to each distinct 
piece of mi^mecfia information available throughout a 
network, information can be readily accessed witfx>ut 
regard to where it is stored. Network clients and servers 
participating in such a "hypermedia" network are 
refened to herein as hypermedia clients and hypem^ 
dia servers, respectively. 

[0003] One significant development that has contrib- 
uted to this growth is the use of facilities such as 
^'markup languages" and associated processes to 
define and irrplement a broad variety of elements spec- 
ifying various syntactical characteristics of documents. 
Many markup languages in use today confonm to inter- 
national standard 1808879:1986, whfoh defines a set of 
t>astc rules for a tag-based language referred to heran 
as the Standard Generalized Markup Language 
(SGML). Perhaps the nfx>st widely used markup lan- 
guage on the Internet that conforms to SGML is the 
Hypertext Markup Lartguage (HTML). 
[0004] Docun^nts that are represented by a tag- 
based markup language are typically displayed and 
nwtpulated by software applicatior^ called txcwsers or 
readers. These software applications implement proc- 
esses conforming to the appropriate markup language 
rules to parse and interpret information representing 
documents so that the documents can be displayed 
properly. 

[0005] Information representing a document accord- 
ing to a SGML-like markup language generally com- 
prises several elements tiiat have tags and possS}ly 
associated tag attributes and tag content These ele- 
ments convey syntactical characteristics of the informa- 
tion conv^ed in the document. 
[0006] A tag identifies the element typa In HTML, for 
exarrple, the element that represents the entire docu- 
ment is Identified by tags marking the start and end of 



the document, elements representing a paragraph of 
text are kientified by a tag that nruirks ttie start of the 
paragraph, and text ttiat is to be displayed with an 
underline is kientified by tags that mark the start and 

5 end of the underiining. 

[0007] Tag attrfoutes provide information that specifies 
one or more diaracter^cs of the element A tag that 
represents an image f Qe to be embedded into a docu- 
ment for exanple, includes an attrfoute that specifies 

10 the name of the image fOe to be embedded. According 
to the specif ication of a markup language, a tag attribute 
may be optional or required according to the associated 
tag type. 

[0008] Tag content represents information that is gen- 

75 eraily intended to be displayed or othenvise available for 
manq:)ulation by a user. Tag content may be optional or 
required according to the type of teg. and it may contein 
other "nested" elements which in tum have tegs, 
attritxites and content. 

20 [0009] Markup languages such as those that conform 
to SGML can provide very f lexit)le arxi powerfol facilities 
for implementing docunrYent elements because SGML 
itself is very flexible. This flexibility is not without cost. 
Addtional bandwidth is required to convey the tegs arxf 

25 teg attributes and additional resources are required to 
parse and interpret the tegs and teg attrituites. In 
HWiU tBigs and attributes are expressed by character 
strings in a form similar to <tagfd name=value> where 
tagid is the tag kJentifier. name is the name of an 

30 attribute and vaiue ^ the value assigned to tfiat 
attribute. A teg may have more tfian one attribute. 
[0010] The additional bandwkith and resources 
required to convey and process the tegs and teg 
attributes is not a significant disadvarrtege in many situ- 

35 ations t>ecause personal computers and other wori^- 
tions with sufficient conputing power and 
communfoation channels with sufficient bandwkith are 
readily available. 

[001 1 ] Th^e is. fiowever, a growing Int^est to provide 
40 access to hypermedia servers connected to networks 
such as the Internet through mobile devrces. particularly 
handheM devices like wireless telephones. These 
devices are characterized tiy severe limitetions in 
processing power and memory space. Furthermore, the 
45 t^arxiwidth of the communication channels connecting 
the vndbjAe devices to the rest of the network is also 
severely limited. 

[001 2] A wireless telephone has only a small fraction 
of the resources provided by a typical desktop or porte- 

50 tAa computer. Typically, the processing power is less 
than one percent of the processing power in many com- 
puters arxi the menrx)ry space Is generally much less 
than 150 kilobytes (kB). The communfoation path is 
often in the range of 400 to 19.200 bits per sec. and the 

55 cost using that communication path is measured in 
terms of United Stetes dollars per 100 kB or mora 
[001 3] The limited t>andwidth of these communication 
channels can be used more effectively by reducing the 
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capacity requremerrts of the information conveyed 
along these channels. Infonmatlon capacity require- 
ments can be reduced by emplpying some form of 
or information compression. 

[0014] General purpose compression schemes such 
as Huffman encoding have been considered but, unfor- 
tunately, general purpose schemes are not attractive 
because the resulting compressed information 
obscures the syntactical characteristics of the underly- 
ing infomnatton. In other words, the identify of tags and 
the presence or ak>sence of tag attributes and tag con- 
tent cannot be easily determined from the compressed 
representation. Furthermore, general purpose com- 
pression schemes usually cannot reduce infbmnation 
capacity requirements as much as compression 
schemes that are based on a specific markup language. 
[0015] Various compression schemes based on spe- 
cific markup languages such as HTML have been con- 
sidered. This type of compressbn scheme is at)le to 
achieve higher levels of compression by exploiting 
known characteristics of the specific markup language 
I=6r example, a markup-language specific compression 
scheme need not allow for the possitMlity of conveying 
tag content for those tags which cannot have content 
Unfortunately, these schemes require that the browser 
or expansion process t>e Me to process or expand all 
compressed elements. Extensions or changes to a 
markup language cannot t>e recovered from a conb 
pressed representation unless t}rowsers are mocfif ied to 
process the new language features; otherwise, com- 
pression of the new feature obscures syntactical char- 
acteristics of those elements incorporating the new 
feature as well as any nested elements. Si^ificantly, a 
browser must be modified even if it is incorporated into 
an application or device that cannot use or does not 
need to use the new feature. 
[001 6] For example, if a markup language-based conrh 
pression scheme is extended to compress a new dis- 
play format, a browser cannot recover that display 
format information from the compressed representation 
unless the txowser is mo(£f ied to include the processing 
required to expand the new featura FurthenTK)re, with- 
out such modification, the t)rowser may not be able to 
ignore or skip the new feature and expand the remain- 
ing infonnation because its processing capat)ilities are 
unable to detemvne the extents of the new compressed 
feature. 

[0017] It is an object of the present invention to reduce 
the bandwidth and resources required to convey and 
process information representing documents in a way 
that does not obscure syntactical characteristics of the 
underlying document elements. 
[001 Bl According to one aspect of the present inven- 
tion, a method for reducing capacity requirements of 
input information representing a document comprises 
receiving the input information and identifying a plurality 
of elements therein, each element having a respective 
type and at least some of the elements having syntax 



information representing one or more respective syn- 
tactical characteristbs, generating a plurality of codes, a 
respective code having a t>eginning and representing at 
least a portion of a respective element in a form having 

5 an information capacity requirement that ^ lower than 
the information capacity requirement of the represented 
portion, the respective code conveying ttie respective 
element type and a syntax incfication indicating pres- 
erv^e or absence of syntax information for the respective 

w element, and ttie respective code conveying the syntax 
indication in a predefined position relative to the begin- 
ning of the respective code, and generating encoded 
information representing the document by assembling 
ttie plurality of codes and portions of ttie plurality of ele- 

15 ments not represented by the plurality of codes into a 
form suitable for transmission or storaga 
[0019] According to another aspect of the present 
Invention, a mettiod for recovering a document compris- 
ing a plurality of elements from encoded information 

20 comprises receiving the encoded information represent- 
ing the document and identifying a plurality of codes 
therein, where a respective code has a beginning, rep- 
resents at least a portion of a respective element con- 
veys a respective type indication indicating the 

25 respective element type and conveys a respective syn- 
tax indication indicating presence or absence of syntax 
information representing one or more syntactical char- 
acteristics of the respective element, obtaining respec- 
tive syntax indications from respective codes at 

30 predefined positions relative to the beginning of the 
respective codes, generating a plurality of decoded rep- 
resentations, where a respective decoded representa- 
tion is derived from a respective code and corresponds 
to the portion of ttie respective element that repre- 

35 sented by the respective code, where the respective 
syntax indication corrtrols generation of decoded repre- 
sentations that represent syntax information and are 
derived in a manner such that information capacity 
requirements of a respective decoded representation is 

40 greater than information capacity requirements of the 
respective code, and assembling the plurality of 
decoded representations and portions of the pluraTity of 
elements not represented by the codes to generate out- 
put information representing the document 

45 [0020] According to yet arxTttier aspect of the present 
invention, a mettiod for recovering a document from a 
plurality of encoded elements in a compressed form 
comprises processing an encoded element to identify 
element type and to obtain a syntax indication of ele- 

50 merit syntactical characteristics, where the syntax indi- 
cation is obtained from a predefined position wittiin ttie 
encoded element relative to the encoded element 
beginning and a compressed representation of the ele- 
ment type is exparxied into an uncorrpressed form of a 

55 mariap language tag, if the syntax indication indicates 
that at least one tag attribute is present, processing tag 
attribute information in the encoded element by expand- 
ing a compressed representation of the tag attrO^ute 
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information into an unconnpressed form of a markup lan- 
guage tag^attritxite name or a tag-atbtxjte value, and if 
the syntax indication indicates that tag content is 
present processing the tag content irrformation in the 
encoded element according to a process appropriate lor 
the tag content 

[0Q21 ] The various features of the present inverrtion 
arxJ its prefen'ed embodiments may be better under- 
stood by referring to the following discussion and the 
accompanying drawings in which like reference numer- 
als refer to like elements in the several figures. The con- 
tents of the following discussbn and the drawings are 
set forth as examples only and should not be under- 
stood to represent limitations upon the scope of the 
present invention. 

Rg. 1 is a schematic Illustration of the major confh 
ponents of a system in which various aspects of the 
present invention may be carried out 
Rg. 2 is a block diagram of a process or device for 
generating a compressed representation of docu- 
ment elements. 

Rg. 3 is a block diagram of a process or device for 
recovering docunrtent elements from a compressed 
representation. 

Rg. 4 is a state diagram of a process for generating 
a compressed representation of document ele- 
ments. 

Rg. 5 is a state diagram of a process tor recoverirrg 
document elements from a compressed represen- 
tatk)n. 

Rg. 6 is a functional flow diagram of a process for 
either compressing or expanding document infor- 
mation. 

Rg. 7 illustrates a simple document expressed in a 
markup languaga 

Rg. 8 is a schematic illustration of encoded infor- 
mation representir^ the document of Rg. 7 pre- 
pared by an erKXxiing process according to the 
present invention. 

Overview 

[0022] Rg. 1 illustrates in schematic form a system in 
which various aspects of the present invention may be 
practiced. Some of the components illustrated in the fig- 
ure may be omitted in various emtxxliments. As shown, 
client 1 uses network 40 to access resources provided 
by server 51 and server 52. Although it is contenrplated 
that server 51 and server 52 are hypermedia servers, 
peihaps operating in conformity with the Hypertext 
Transfer Protocol (HTTP), this is not necessary to prac- 
tice the present invention. In typical enrixxliments, 
renxrte device 11 provides a user interface through 
which information can be presented to a user and input 
can be received from a user, and computer 31 
exchanges information with n^work 40 in a manner that 
is consistent with a conventional network client 



[0023] Computer 31 stores parameters and informa- 
tion in staage 32 that typically ^ a combination of ran- 
dom access memory (RAM), read-only menrK>ry (ROM) 
and fong-term storage devices such as magnetic and 

5 optical disk drives. Computer 31 communicates with 
remote device 11 through receiver 21 and transmitter 
22. Information that is sent t>y connputer 31 through 
transmitter 22 is received by renrxste devfoe 11 through 
receiver 1 6. Information that is sent by remote device 1 1 

10 through transmitter 15 Is received computer 31 
through receiver 21. 

[0024] In the embodiment shown in Rg. 1, remote 
device 11 comprises display 12, one or nrx>re buttons 
13, storage 14, transmitter 15 arxi receiver 16. For 

75 example, device 1 1 may be a wireless telephone such 
as a MobileAocess™ telephone by Mitsut)ishi Wireless 
Communications, Inc., or a Duette telephone by Sam- 
sung Electronics Corporation. In typical wireless tele- 
phones, the d'^ay 12 is a liquid crystal display (LCD) 

20 panel. Buttons 13 represent one or more data entry 
devices such as switches, keys or buttons. Storage 14 
represents menxxy circuits or other devices that are 
capable of storing digital information. Preferat)ly, at least 
part of storage 14 is pers^ent storage, meaning that 

25 information is retained when device 1 1 is tunned off. In 
some embodiments, a portion of storage 14 is organ- 
ized into a unified push^ull cache. It is also contem- 
plated that a portion of storage 14 will store program 
ir^tructions, either in persistent memory or in ROM. and 

30 tfiat device 1 1 will comprise a microprocessor or other 
type of processing circuitry capable of executing the 
pro£pfam instructions. 

[0025] The nature of the communication paths shown 
t>etween computer 31 , server 51 server 52, receiver 21 

.35 and transmitter 22 are not aitical to the practice of the 
present invention and may be implemented as switched 
and/or non-swrtched paths using private and/or p\Mc 
facilities, for example. Similarly, the topology of network 
40 ^ not critical and may be implemented in a variety of 

40 ways including hierarchical and peer-to-peer networks. 
Computer 31 and server 51 may be located locally with 
respect to one another and may be implemented on the 
same hardwara 

[0026] The nature of the conununication paths 
45 between computer 31 and device 11 also is not critical 
to the practice of the present invention; however, in 
many applications devfoe 1 1 is a wireless device that 
uses a communication technology such as electromag- 
netic transmission in the radio-frequency to infrared por- 
50 tions of the spectrum. In applications where device 1 1 is 
a wireless telephone, a cellular telephone for example, 
transmitter 15. receiver 16, receiver 21 and transmitter 
22 represent communication facilities used for normal 
telephone calls. 

55 

Remote Device 

[0027] In applications where remote device 1 1 and 
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computer 31 ioplement dient 1 as a HTTP dient, 
device 11 provides at least ttiree t>asic functions: (1) a 
navigation function allows a user to navigate or traverse 
HTTP Uniform Resource Locator (URL) hyperlinks, (2) 
a communication function exchanges information with 
computer 31, and (3) an interface function provides a 
user interface through whk:h information may be pre- 
sented to the user and through which input may be 
received from the user. 

[0028] Preferat3ly. these functions are implemented fcvy 
a software-controlled process using an event-driven 
architecture. Events may be initiated by a user through 
buttof^ 13. for example, or may be initiated by signals 
received through receiver 16. The navigation function 
operates in either of two states. In the "ready" state the 
device awaits user input specifying a hyperlink to 
traverse. In the "pencOng" state tfie communication func- 
tion has submitted a request to computer 31 and the 
device is waiting for a reply from computer 31 . In terms 
of the HTTP, the ready state warts for user input specify- 
ing the URL of a hypermedia entity to display or process 
and tiie pending state waits for computer 31 to provide 
a requested hypermedia entity. 
[0029] In one embodiment, hypermecfia irrtomnation is 
exchanged with computer 31 according to the Handheld 
Device Transfer Protocol (HDTP). A versfon of thiis pro- 
tocol is described in the "HDTP Specification," part 
number HDTP-SPEC-DOC-101, published July 15, 
1997 by Unwired Planet, Inc., Redwood Shores, Califor- 
nia, which Is incorporated herein by reference in its 
entirety. The HDTP resembles the HTTP but is opti- 
mized for use with renrxjte devk;es like wireless tele- 
phones and preferably is conveyed using the User 
Datagram Protocol/IP (UDP/IP). The UDP/IP is gener- 
ally regarded as being less reliable than TCP/IP. for 
example, because it does not guarantee that packets 
will be received, nor does it guarantee that packets will 
be received in ttie same order that they are sent Data- 
gram protocols like the UDP/iP are attractive in practic- 
ing the present inventfon, however, because it does not 
require a "connection" to be established between a 
sender and a receiver before information can be 
exchanged. This eliminates the need to exchange a 
large number of packets during session creation. 
[0030] In a preferred embodiment, hypermedia infor- 
mation is organized according to ttie Handheld Device 
Markup Language (HDML) into cards arvJ decks. Multi- 
ple decks and other types of message entities can be 
organized into information structures called digests. A 
version of this markLp language is descrbed in the 
"HDML 2.0 Specification," part number HDML-SPEC- 
DOC-200, Revision A, published March 1997 by 
Unwired Planet, Ina. which is incorporated herein by 
reference in its entirety. 

Intermediate Computer 

[0031 ] According to the embocfiment discussed here. 



computer 31 together with renxrte device 1 1 provide the 
functions of a conventional hypermedia dient In this 
embocfiment. computer 31 receives information from 
renrxjte device 1 1 according to the HDTP. translates the 

5 HDTP information into corresporKling HTTP information 
as necessary, and sends the result to sender 51. Simi- 
larty. corrputer 31 receives informatfon from server 51 
according to the HTTP, translates the HTTP information 
into corresporxfing HDTP information as necessary, 

10 and sends the result to remote device 1 1 . HDTP infor- 
mation exchanged t>etween contputer 31 arxJ renrxste 
d^e 11 is compressed according to the present 
invention to reduce information capacity requirements 
and to reduce the processing required by r^ote device 

IS 1 1 to parse and interpret the information. This compres- 
sion and the corrplementary expansion is carried out by 
encoding and decoding processes perfonmed in remote 
device 1 1 arxi in computer 31 . 

20 Processes 

[0032] Fig. 2 illustrates one embodiment of an encod- 
ing process according to the present invention for gen- 
erating a coirpressed representation of document 

25 elements. Mentify-elements 62 receives from path 61 
information representing a document and identifies a 
plurality of elements within that information. Elements 
typically have syntax information that represents at least 
some aspect of the structure and syntactical character- 

30 istics of tfie document 

[0033] Encode 64 generates a plurality of codes rep- 
resenting at least a portion of at least some of the doc- 
ument elements. At least some of the codes impose 
lower information capacity requiremerrts ttian the ele- 

35 ment information tfiat is represented. The codes convey 
the type of the element thai is represented as well as an 
indication whether syntactical infonmation for the ele- 
ment is present. Preferat)ly, at least some of the syntac- 
tical information encoded in a manner that lowers 

40 infomnation capacity requirements. Element information 
is passed along path 63 as necessary to process any 
nested information. Nested information may be proc- 
essed in a variety of ways induding recursive proc- 
esses. 

45 [0034] Assemble 66 generates encoded information 
along path 67 representing a document by assembling 
the codes generated by encode 64 and any elements or 
portions of elements that are not represented by those 
codes into a form tfiat is suitable for transmission or 

50 storaga 

[0035] Another embodiment of the present invention 
indudes code book 68 that provides a plurality of code 
books. Encode 64 adaptivdy selects a code book from 
this plurality of code books and generates one or more 
55 codes according to the selected code book An indica- 
tion of the selected code book is induded with ttie 
encoded Information. 

[0036] Rg. 3 illi^trates one emtxxfiment of a decoding 
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process according to the present invention for recover- 
ing document elements from an encoded representa- 
tion. Identify-codes 72 receives from path 71 encoded 
information representing a document arxi iderrtifies a 
plurality of codes that each represent at least a portion 5 
of a respective docun^nt element 
[0037] In response to the codes, decode 74 obtair^ 
syntax indicatior^ and generates decoded representa- 
tions. At least some of the decoded representations 
impose greater information capacity requirements tiian 10 
the corresponding codes. The syntax indications indi- 
cate the presence or absence of syntax information rep- 
resenting one or more syntactical characteristics of the 
document. Decoded representations are passed along 
path 73 as necessary and processed to handle any 75 
nested codes. Nested codes may be processed in a 
variety of ways including recursive processes. 
[0038] Assemble 76 generates output information 
along path 77 representing the document by assem- 
bling the decoded representations generated by decode 20 
74 and any elements or portions of elements that are 
not represented ty those codes. 
[0039] Another emtxxiimerTt of the present invention 
includes code book 78 that provides a plurality of code 
books. Decode 74 adaptively selects a code book from 2s 
this plurality of code books in response to an incfication 
of a selected code txx>k within the encoded information . 
arxJ generates one or more decoded representations 
according to the selected code book. 
[0040] In yet another embodiment of the present 30 
invention, process 80 receives output information from 
path 77 and generates signals along path 81 that repre- 
sent a presentation for display. In certain situations, 
decode 74 may erKxxjnter codes that canrxit t>e 
decoded k>ecause the codes are unknown or unsup- 35 
ported by the decoding process. Decode 74 may pass 
these unsupported codes along path 73 for any subse- 
quent process that ts able to use the codes. Process 80 
uses the syntax incfication in the unsupported codes to 
skip or avoid processing those codes. 40 
[0041 ] In a further embodiment, decode 74 Includes a 
process similar to process 80 for generating signals that 
represent a presentation for display. In this embodiment 
this process in decode 74 uses the element type arxi 
syntax indications conveyed in the codes to determine 4s 
whk;h codes should be skipped because, for example, 
the display device Is unable to respond appropriately to 
the element represented by the codes. 

Encoding so 

State Process 

[0042] The encoding process of encode 64 may be 
d^cussed in terms of a state process such as that illus- 55 
trated in Rg 4. Each of the states is represented by a 
cirda Transitions t>etween states are represented by 
lines and occur in the directions incficated by the arrows. 



[0043] The encoding process begins at state 100 
(start) and makes a transition along 110 to state 101 
(encode tag). State 101 generates an encoded repre- 
sentation of a respective element tag. If the respective 
element tag is not accompanied by any associated syn- 
tax information, a transition is made along patti 1 1 1 to 
state 101 which generates an encoded representation 
for the sut)sequent element tag. If one or more tag 
attritxjtes are present a transition Is made along path 
112 to state 102 (encode attribute name). If no tag 
atta-itxites are present but tag content is present, a tran- 
sition Is made along path 118to state 105(encode con- 
tent). When no further element tags are present, a 
transition is made along path 122 to state 107 (end) to 
terminate the encoding process. 
[0044] State 102 generates an encoded representa- 
tion for a respective attribute nama A transition is made 
along path 113 to state 103 (encode attrbute value). 
State 103 generates an encoded representation of the 
corresponding attribute value. If a sut>sequent tag 
attribute is present, a transition is made ak>ng 114 to 
state 102 which generates an encoded representation 
for the sut}sequent tag attrOxJta 
[0045] When no further tag attrbutes are present a 
transition is made atong patii 1 15 to state 104 (attrilxite 
end). State 1 04 generates a code that sisals the end of 
the tag attributes for the respective element. If tag con- 
tent is present a transition is made along 1 17 to state 
1 05 to process the tag content. If no content is present, 
a transition is made along path 1 1 6 to state 101 to proc- 
ess a sut>sequent element tag. 
[0046] State 105 generates an encoded representa- 
tion of a respective tag content If sut>sequent tag con- 
tent is present a transition is made along path 1 19 to 
state 105 to process the subsequent content When no 
' further tag content is present, a transition is made along 
path 120 to state 106 (content end). State 106 gener- 
ates a code that signals the end of tag content for the 
respective element. A transition is then made along 
path 121 to state 101 to process a sut>sequent element 
tag. 

[0047] As will be explained in wore detail below, tag 
content may contain nested elements. If a nested ele- 
ment is present, a recursive transition to state 100 is 
made along a path that is not illustrated. When alt ele- 
ments at a particular level of nesting have t>een proc- 
essed, a recursive retum transition to state 105 is made 
along another path that is not illustrated. 

Example 

[0048] Rg. 7 illustrates a simple document expressed 
in a markup language such as HTML The document is 
arranged in Unes and each line is nunt>ered for conven- 
ient reference in this discussion. TTie line numbers do 
not form part of the markup language. It is anticipated 
that in practical emtxxiiments. the document may be 
conveyed without any indication of lines or other seg- 
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mentation other than that which is provided by the 
markup language. 

[0049] Line 1 contains an <HTML> tag marking the 
beginning of an HTML document and line contains a 
</H7ML> tag marking the end of the document. In this s 
example, the <HTML> tag does not have attrixites but 
it does have content which is the txxiy of the document 
marked by beginning and ending BODY tags on lines 2 
and 7, respectively. The <BODY> tag does not have 
attrSxites but does have content. The content of the 
<BODY> tag Is nested within the content of the 
<HTML> tag. The BODY tag content, which ts shown in 
lines 3 through 6, comprises text and several tags. 
[0050] The portion of the <BODY> tag content shown 
in line 3 represents simple text. The portion of the con- 
tent shown In line 4 is an element with an IMG tag tfiat 
has no content but has an attrbute with a name (snc) 
and a value {"^tenugff") specifying the source of an 
image for cfisplay. The portion of the content shown in 
line 5 text that contains a pair of elements with begin- 
ning and ending B tags marking words for display in a 
boldfaced font. Neither <B> tag has an attrflxjte but 
each has text content The portion of the content shown 
in line 6 Is text that contains an element with beginning 
and ending A tag& The <A> tag has both an attribute 
and content. The tag attrftxite has a name {href) and a 
value {"http://a.urf/inf6") specifying the URL of 
another document. The content of the <A> tag is the 
text liere** appearing just before the ending <AA> tag. 
[0051 ] Rg. 8 is a schematk: illustration of an encoded 
representatbn obtained by applying the encoding proc- 
ess discussed above to the document markup language 
illustrated in Rg. 7. The encoded representation as 
shown in Rg. 8 is arranged in lines tfiat are numbered 
for convergent reference in this discussion arxJ are 
indented for ease of comprehension. It is anticipated 
that, in practical errtxxjiments, encoded information is 
generated in a form that does not contain any indication 
of lines or other segmentation other tfian that provided 
by the encoded representations of markup language 
elements. 

[0052] Refem'ng to Rg. 8, the notation {XYZ-AC} 
denotes a code that contains an encoded representa- 
tion of markup language tag <XYZ> and contains an 
indication that one or more tag attributes are present 
and that tag content is present. For exarrple, in line 1 
the notation {KTML-C} derxites a code that contains an 
encoded representation of a <ifnML> tag and contains 
an indication that no tag attrOtxite is present but tag con- 
tent is present Similarly, the notatton {IMG-A} in line 4. 1 
denotes a code that contains an encoded representa- 
tion of a <IMG> tag and contains an indication that one 
or more tag attrbutes are present but no tag content is 
present 

[0053] According to the example shown in Rg. 8. the 
notation {hlTML-C) shown in line 1 denotes the code 
that represents the <HTML> tag shown in line 1 of Rg. 
7. As explained atxive, the code conveys the element 



type that is represented and it contains an indication 
that tag content is present In line 2, the notation {BODY- 
C} denotes a code representing the <BODY> tag (line 2, 
Rg. 7) and incficating that tag content is present. 
[0054] In line 3, the notation {STR} denotes a special 
code that is i^ed to mark the presence of text The 
notation HTie item" represents the text itself. Ths code 
always implidtty indicates that tag content is present. 
Text may be marked in a variety of ways using either 
explicit or implicit codes. For example, the beginning of 
a text string may be marked inplicitty by reserving cer- 
tain values for text characters. Such schemes are gen- 
erally context dependent becai^e these reserved 
values are likely to occur in fields of binary data, for 
example. In preferred embodiments, the beginning of 
text strings are marked by an explicit code such as that 
represented by the notation {STR} shown in the figure. 
The end of a text string may be marked explicitly by a 
special character such as a null or binary zero, explidtly 
by an express length value included with the beginning 
code, or implicitty by a code that is not a valid text char- 
acter. Ho particular scheme critical to the practice of 
the present invention. 

[0055] Lines 4. 1 through 4.3 collectively represent the 
encoded representation of the document element 
shown in line 4 of Rg. 7. In line 4.1. the notation {IMQ- 
A} denotes a code representing the <IMG> tag and indi- 
cating that one or more tag atti^utes are present. In line 
4.2, the notation {src} denotes a code representing the 
name ''src" of the tag attribute. This code may be a com- 
pressed representation of the name itself as discussed 
more fully below or it may be a generic attribute code 
indicating tiiat the attrbute name is specified in some 
other form such as a conventional text string. The ntitsi- 
tion ("/item.gff') denotes a conventional text string pro- 
viding the value of the attrftxite. Alternatively, the 
attrftxjte value couM be encoded into some other form 
such as a binary code. In line 4.3, the notation {END: 
img-a} denotes a code marking the end of the tag 
attributes for the <IMG> tag. In one embodiment of the 
present invention, one code is used to mark the end of 
attrOxites and another code is used to mark the erxJ of 
content. In another emtxxiiment different codes are 
used according to element type. In yet arwther embodi- 
ment different codes are used to mark the end of 
attrftxites and content according to element type. Refer- 
ring to the example shown in line 4.3, according to these 
embodments. the notation Img-a" may be understood 
to represent a unique {END} code for marking the end of 
IMG attributes. In a prefened embodiment, however, 
one specific code such as a null or zero value is used to 
mark the end of attributes and content for all types of 
tags. For this embodiment, the notation "img-a" may be 
understood to be merely a convenience for the reader 
showing the corresponderKe t)etween codes for tag 
attrbutes, content and end codes. 
[0056] Lines 5. 1 through 5.5 collectively represent the 
erxxxJed representation of the document contents 
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shown in line 5 of Rg. 7. In lines 5.1 and 5.3. the nota- 
tion {SIR} and the aocompanying text denote codes 
and text that represent two text strings shown in line 5 of 
Rg. 7. 

[0057] Lines 5.2. 1 through 5.2.3 collectively represent 
the encoded representation of the first <B> element in 
line 5 of Rg. 7. In line 5.2.1 . the notation {B-C} denotes 
a code representing the <B> tag. and indicating that 
content Is present In line 5.2.2. the text content Is rep- 
resented by the notation {STR} "red" as explained 
above. In line 5.2.3. the notation {END: b-c} denotes a 
code that marks the end of the content for the <B> tag. 
Similarly lines 5.4.1 through 5.4.3 collectively represent 
the encoded representation of the secorxi <B> element 
in line 5 of Rg. 7. 

[0058] In line 5.5. the notation {SIR} Ibr a limited 
time." represents an encoding of a text string as 
explained above and complies the encoded represen- 
tation of the document contents sfiown in line 5 of Rg. 
7. According to the example shown in Rg. 8, the 
encoded representation for the document contents in 
line 6 of Rg. 7 are shown in lines 6.1.1 through 6.3., col- 
lectively. In a practical embodiment of the present Inven- 
tion, however, the adjacent text strings Ibr a limited 
time" and "Click" could be combined Into one encoded 
representation of denoted by {SIR} "tor a limited time. 
Qick". 

[0059] As just explained, lines 6.1 .1 through 6.3 col- 
lectively represent the encoded representation of the 
document contents shewn in line 6 of Rg. 7. As d^- 
ci^sed. the notation {STR} "Qid^ in line 6.1 .1 denotes 
the encoding of a text string. In line 6.1.2. the notation 
{A-AC} denotes a code representing the <A> tag. and 
indicating that tag attributes and content are present. In 
line 6.1.3, the notation {href} ("http7/a.url/info") denotes 
an encoding that represents the name and value of the 
tag attrftxfte. In line 6.1.4, the notation {END: a-a} 
denotes a code that marks the end of the attritxjtes for 
the <A> tag. In line 6.2.1. the notation {STR} "here" 
denotes an erxxxiing of a text string that is the tag con- 
tent. In line 6.2.2. the notatbn {END: a-c} marks the end 
of the content for the <A> tag. The notation in line 6.3 
represents a text string, which completes the encoded 
representation of the document contents shown in line 6 
of Rg. 7. 

[0060] In lines 7 and, the notations {END: body-c} and 
{END: html-c} denote codes that mark the end of the 
content for the <BODY> and ^ITML> tags, respec- 
tively. 

Compression 

[0061 ] A variety of encoding or compression schemes 
may be used to generate codes having information 
capacity requirements that are lower than the informa- 
tion capacity requirement of the document element, or 
portion of a document element that is represented by a 
code. The codes are generated to convey both the type 



of document element tfiat is represented and an indica- 
tion whether syntax information is present in the docu- 
ment element. The Indication of syntax information 
conveyed in a predefined position relative to the begin- 

5 ning of the code. 

[0062] According to a preferred embocfiment of the 
invention, codes have a fixed ler^ of one byte (8 
binary bits) in which one or more bits, say the two most 
significant bits, are reserved to indicate whether syntax 

10 information is abserrt or present For HTML, for exam- 
ple, two t>its may be reserved to indicate the presence 
or absence of one or more tag attrfoutes and tag con- 
tent respectively. Other code structures are possible 
including codes that are variable in length. For example, 

75 a code coufo include a variable length indication of ele- 
ment type generated by Huffmann emxxiing and a sep- 
arate indication of syntax information. The indication of 
syntax information may be placed in any predefined 
position relative to the beginning of the coda 

20 [P063] Rules may be established to allow tiie prede- 
fined position to vary according to element type. For 
example, tiie position may be d^ined to immediately fol- 
low a variable-length indication of element type. As 
another example, one position could be predefined for a 

25 dass of special codes, say those that have one of sev- 
eral specified values, and arKsther position couM be pre- 
d^ined for other codes. The pred^ined position is fixed 
independent of elenient type in preferred embodiments. 



[0064] In a prefenred embodiment, a class of six spe- 
cial codes is established. These spedal codes are 
referred to as "global codes" because, according to this 

35 embodiment all encoders and decoders nust be^atsle 
to correctly interpret and process these codes. These 
six codes are discussed t>^ow. 
[0065] A special code denoted {CBK} nwks a value 
that specifies a code book tiiat has been adaptively 

40 selected from a plurality of code txx)ks. Decoding is per- 
formed according to the selected code txx)k. As 
explained briefly atxTve, fixed length 8-t)it codes are 
used to convey both element type and an indication of 
syntax information. If two bits are used to convey the 

45 indication of syntax infonmation, only six bits remain to 
convey element typa Generally, the rtumt>er of ele- 
ments far exceecte what can t>e expressed in six bits. 
This limitation Is even more severe because it Is desira- 
ble to use these codes to also represent frequentiy used 

50 attritxfte names and/br attribute values. By organizing 
codes into a plurality of code books and selecting an 
appropriate code book from this plurality, the size of the 
coding space can be extended significantiy When an 
encoder selects a code booK an indication of the selec- 

55 tion is assembled into the encoded information so that a 
complementary decoder can determine which code 
book should be used for decoding. The {CBK} code is to 
mark this indication. 
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[0066] A special code denoted {CHR} marks a value 
that specifies a character. For exanple. documents that 
are represented by text that confomts to the American 
Standard Code for Information Interchange (ASCII) 
cannot represent some of the characters defined in Uni- 
code text. Any Unicode character can be represented 
by a numeric value marked by the {CHR} code. 
[0067] A special code denoted {DAT} marks the start 
of "opaque" data that is not to be processed by the 
decoder. The data is said to be opaque In the sense that 
the internal structure of the data need not be known to 
the encoder. Opaque data is marked and Included in the 
encoded information without nrxxfification. The extent of 
the opaque data Is conveyed by a length value tfiat 
accompanies the {DAT) coda 
[0068] A special code denoted {END} marks the end 
of certain elements and syntax Information as descrfoed 
abova 

[0069] A special code denoted {STR} marks the start 
of a Xexi string as descrfoed above. 
[0070] A special code denoted {UNK} marks an 
unknown element type. The use of this code improves 
the ability of existing encoders and decoders to process 
documents that contain elements that were und^ined 
at the time the encoders and decoders were imple- 
mented. An okJer encoder can pass along the unknown 
element in a form that allows a wore recent decoder to 
receive and process the new element An older decoder 
working in conjunction with an okier encoder is able to 
skip the element marked by the {UNK} code and 
resume processing other known codes. 

Decoding 

[0071] The decoding process of decode 74 may be 
discussed In terms of a state process such as tfiat illus- 
trated in Fig. 5. Each of the states is represented by a 
drde. Transitions between states are represented by 
lines and occur In the directions Incficated by the arrows. 
[0072] The decoding process begins at state 130 
(start) and makes a transition afong 140 to state 131 
(decode tag). State 131 generates a decoded represen- 
tatfon of a respective elem ent tag tfiat is derived from a 
respective code. If the respective code indicates that no 
syntax information is present, a transition Is made afong 
path 141 to state 131 which generates a decoded repre- 
sentation derived from a sut>sequent code. If the code 
indicates that one or more tag attributes are present, a 
transition made along path 142 to state 132 (decode 
attrftxjte name). If the code indicates tfiat no tag 
attrSxite is present but tfiat tag content Is present a 
transition is made along path 148 to state 135 (decode 
content). When no further element tags are present a 
transition is made afong path 152 to state 137 (end) to 
terminate the decoding process. 
[0073] State 1 32 generates a decoded representation 
of a respective attrfoute name. A transitfon is made 
along path 143 to state 133 (decode attrSxite value). 



State 133 generates a decoded representation of the 
corresponding attrbute valua H a subsequent tag 
attribute is present, a transitfon is made along 144 to 
state 132 which generates a decoded representation of 

5 the sut>sequent tag attrftxjta 

[0074] When no further tag attritxjtes are present, if 
tag content Is present, a transition is made along 1 47 to 
state 135 to process the tag content If no content ^ 
present, a transition made along path 1 46 to state 1 3 1 

10 to process a sut>sequent coda 

[0075] State 1 35 generates a decoded representation 
of a respective tag content If sut>sequent tag content Is 
present, a transition Is made afong path 1 49 to state 1 35 
to process the subsequent content. Wh^ no further tag 

IS content is present a transition is made along path 151 
to state 131 to process a sibsequent code. 
[0076] As will be explained atx>ve, tag content may 
contain nested codes, tf a nested code is present, a 
recursive transition to state 130 is made along a path 

20 that is not illustrated. When all codes at a partfoular level 
of nesting have been processed, a recursive return tran- 
sition to state 135 is made afong another path that is not 
illustrated. 

25 Recursion 

[0077] The state diagrams illustrated in Rgs. 4 and 5 
do not sfKyw any provisfon for recursion. Recursfon 
not required to practice the present invention but it Is an 
30 eff foient technique in many embodiments for processing 
nested elements and codes. A functional flew digram 
illustrated in Rg. 6 represents a recursive process for 
either encoding or decoding document elements 
expressed in a markup langu^e such as HTML 

35 

Encoding 

[P078] According to the illustrated process for erKOd- 
ing, step 221 performs various initialization tasks. Step 

40 222 Initializes the recursion level to zero. Step 223 proc- 
esses element tags to generate an encoded represen- 
tation. Step 224 interrogates whether any tag attributes 
are present. H so, step 225 processes the tag attribute 
to generate an encoded representation and then 

45 returns to step 224 to intenrogate whether any other tag 
attributes are present When no further tag attritxjtes 
are present the process continues with step 226. 
[0079] Step 226 interrogates whether tag content ^ 
present, tf so, step 227 process the tag content to gen- 

50 erate an encoded representation. Step 228 interrogates 
whether any elements are nested wittiin the tag content. 
If not the process returns to step 226 to interrogate 
whetiier any other tag content is present. When no fur- 
ther tag content is present the process continues with 

55 Step 230. If an element is nested within the tag content 
step 229 increments the recursion level and the process 
continues with step 223. 

[0080] Step 230 interrogates wh^er the cunent 
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recursion level is zera If it is not zero, step 231 decre- 
ments the recursion level and the process continues 
with step 226. If the recursion le^el is zero, step 232 
interrogates whether the encoding process is dona If 
not tfie process return to ^ep 223. If the encoding proc- 5 
ess is done, step 233 perfomns various ternmnation 
tasks. 

Decoding 

10 

[0081 ] According to the illustrated process for decod- 
ing, step 221 performs various initialization tasks. Step 
222 initializes the recursion level to zero. Step 223 proc- 
esses codes to generate a decoded representation. 
Step 224 interrogates whether any tag attributes are is 
present If so. step 225 processes the code represent- 
ing the tag attrbute to generate a decoded representa- 
tion and then returns to step 224 to interrogate whether 
any other tag atbtn/tes are present. When no further 
tag attributes are present, the process continues with 20 
step 226. 

[0082] Step 226 interrogates whether tag content is 
present If so. step 227 process the code representing 
the tag content to generate a decoded representation. 
Step 228 inten-ogates whether any codes are nested ^ 
within the encoded tag content. If not, the process 
returns to step 226 to interrogate whether any other tag 
content is present. When no further tag content is 
present the process continues with step 230. If a code 
is nested within the encoded tag content, step 229 30 
increments the recursion level and the process contin- 
ues with step 223. 

[0083] Step 230 interrogates whether the current 
recursion level is zero. H it is not zero, step 231 deae- 
ments the recursion level and the process continues 3s 
with step 226. If the recursion level is zero, step 232 
intenrogates whether the decoding process is dona If 
fKit tfie process return to step 223. If the decoding proc- 
ess is done, step 233 performs various termination 
tasks. 40 

Claims 

1. A method for reducing capacity requirements of 
input information representing a document, the 45 
method corrprising:- 



that is tower than the information capacity 
requirement of the represented portion, 
wherein the respective code conveys the 
respective element type and a syntax indica- 
tion indicating presence or at>sence of syntax 
information for the respective element, and 
wherein the respective code conveys the syn- 
tax indicatton in a predefined position relative 
to the beginning of the respective code; and 
generating encoded information representing 
the document by assembling the plurality of 
codes and portions of the plurality of elements 
not represented t>y the plurality of codes into a 
form surtat)le for transmission or storage. 

2. A method according to daim 1 that further com- 
prises selecting a code book from a plurality of code 
books, wherein at least some of the codes are gen- 
erated according to the selected code book and an 
irxiication of the selected code book is assembled 
irrto the encoded information. 

3. A method for recovering decoded information rep- 
resenting a document from encoded information, 
wherein the document comprises a plurality of ele- 
ments, the method comprising 

receiving encoded information representing the 
document and id^itifying a plurality of codes 
therein, wherein a respective code has a begin- 
ning, represents at least a portion of a respec- 
tive element conveys a respective type 
incfication indicating the respective element 
type and conveys a respective syntax indica- 
tion indicating presence or absence of syntax 
information representing one or more syntacti- 
cal characteristics of the respective element; 
obtaining respective syntax indications from 
respective codes at predefined positions rela- 
tive to the beginning of the respective codes; 
generating a plurality of decoded representa- 
tions, wherein a respective decoded represen- 
tation is derived from a respective code arxl 
correspofxte to the portion of the respective 
element that is represented by the respective 
code, wherein the respective syntax indication 
controls generation of decoded representa- 
tions that represent syntax information, arxl 
wherein the respective decoded representatk>n 
is derived in a manner such tfiat information 
capacity requirentents of the respective 
decoded representation is greater than infor- 
mation capacity requirements of the respective 
code; and 

assembling the plurality of decoded represen- 
tations and portions of the plurality of elements 
not represented by the codes to generate out- 
put information representing the document. 



receiving the input information representing the 
document and identifying a plurality of ele- 
ments therein, wherein each element has a so 
respective type and wherein at least some of 
the elements have syntax Information repre- 
senting one or more respective syntactical 
characteristics; 

generating a plurality of codes, a respective ss 
code having a beginning and representing at 
least a portion of a respective element in a form 
having an information capacity requirement 
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4. A method according to any preceding daim 
wherein the elements conform to a tag-based 
markup language, each element comprising a 
markup language tag. and wh^ein the syntax infor- 
mation includes tag attritxite and tag content. 5 

5. A method according to daim 4 wherein the tag- 
based markL4> language confomns to a Standard 
Generalized Markup Language (SGML) Document 
Type Definition (DTD). w 

6. A mettiod according to any preceding daim 
wherein the codes have a form such that the syntax 
Indication Indicates preseru^e or at>sence of syntax 
Information In a manner that is independent of ele- is 
menttype. 

7. A method according to any preceding daim as 
dependent on daim 4 that further comprises gener- 
ating signals representing a presentation for display 20 
on a device by processing the output information 
according to the elements ttierein. wherein the 
processing uses the syntax indication of one or 
more elements In said output information to avdd 
processing syntax information that otherwise would 2s 
affect one or more characteristics of the presenta- 
tion. 

8. A m^hod according to any preceding daim as 
dependent on daim 4 that further oonprises gener- 30 
atlng signals repres^ng a presentation for display 

on a device by processing the encoded Information 
according to the codes therein, wherein the 
processing uses the syntax Indication of one or 
more codes in said encoded information to avoid 35 
processing syntax Information that otherwise woi^d 
affect one or more characteristics of the presenta- 
tion. 

9. A method according to any preceding daim as 40 
dependent on daim 4 wherein the encoded infor- 
mation indudes one or more instances of an unsup- 
ported code from whk:h respective decoded 
representations are not derived, and wherein the 
output information is gaierated by also assembling 45 
the one or more instances of the unsupported code. 

10. A metiiod according to any preceding daim 
wherein the codes have a fixed length and convey 
the syntax incfication at a f ixed position relative to so 
the beginning of the codes. 

11. A method according to any preceding daim as 
dependent on daim 4 wherein the encoded Infor- 
mation indudes an indication of a selected code ss 
book that Is selected from a plurality of code books 
and wherein at least some of the decoded repre- 
sentations are derived from the codes according to 



the seleded code book. 

12. A method for recovering decoded Information rep- 
res^ng a documerrt from encoded Information 
conprising a plurality of encoded elements In a 
oorrpressed form, the method conprising:- 

processing an encoded element to Identify ele- 
ment type and to obtain a syntax indication of 
element syntactical characteristics, wherein 
the syntax indication Is obtained from a prede- 
fined position within the encoded elenr^nt rela- 
tive to the erKXXjed element t>eginning and a 
conrpressed representation of the element type 
is expanded into an uncompressed form of a 
markup language tag; 

wherein rf the syntax indication indicates that at 
least one tag attribute Is present, processing 
tag attribute information in the encoded ele- 
ment by expanding a compressed representa- 
tion of the tag attrfoute information into an 
uncompressed form of a niarkup language tag- 
attritxite name or a tag-attribute value; arxl 
wherein If the syntax Indication Indicates that 
tag content is present, processing the tag con- 
tent information in the encoded element 
according to a process appropriate for the tag 
content 

13. A method according to daim 12 wherein the 
mariap language tag conforms to a Standard G^- 
erallzed Mariuip Language (SGML) Document 
Type Definition (DTD). 

14. A.method according to daim 12 or 13 wherein the 
erKxxied elements have a form such that the syntax 
indication indicates presence or at>sence of syntax 
information in a mamer that Is independent of ele- 
nrmnttype. 

15. A method according to any one of daims 12 
through 14 that further comprises generating sig- 
nals representing a presentation for display on a 
device t>y processing the output information accord- 
ing to the elements tfierein, wherein the processing 
uses the syntax indication of one or more elements 
in said output infonmation to avoid processing tag 
attribute information or tag content that othenvise 
would affect one or more characteristics of the 
presentatfon. 

16. A method according to any one of claims 12 
ttirough 15 wherein the encoded Information 
indudes one or more Instances of encoded ele- 
ments of an unsupported type that are not 
expanded Into an uncompressed form of a markup 
language tag, and wherein the output information is 
generated by also assent)ling the one or more 
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instances of encoded elements of the unsupported 
type. 

17. A method acoorcfng to any one of claims 12 
through 16 wherein the encoded infomnation s 
includes an indication of a selected code book that 

is selected from a plurality of code txx>ks and 
wherein the marfoip language tag. tagnattrSxjte 
name or tag-attribute value is expanded into an 
uncompressed fomi according to the selected code io 
book. 

18. A method acoorcBng to any one of claims 12 
through 1 7 wherein the compressed representation 

of element type has a fixed length and the syntax is 
indication is conv^ed at a fixed position within the 
conpressed representation of element type. 
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